-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Fix proactive scale up injecting fake pods for scheduling gated pods #8580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix proactive scale up injecting fake pods for scheduling gated pods #8580
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: abdelrahman882 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
286ba74
to
ff346b2
Compare
ff346b2
to
4627816
Compare
cluster-autoscaler/processors/podinjection/pod_injection_processor_test.go
Outdated
Show resolved
Hide resolved
4627816
to
030ad2c
Compare
030ad2c
to
a281a40
Compare
for _, podOwnerRef := range pod.OwnerReferences { | ||
// SchedulingGated pods can't be unschedualable nor unprocessed nor scheduled so it is not expected | ||
// to have them as group sample nor in pod count, so decreasing desiredReplicas by one is enough | ||
if grp, found := groups[podOwnerRef.UID]; found { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should only subtract when podOwnerRef.Controller != nil && *podOwnerRef.Controller
to be in sync with updatePodGroups
below. But actually, instead of adding replicas and then subtracting them here, wouldn't it suffice to pass scheduling gated pods to groupPods
? We're already passing a combined list of scheduled and unschedulable pods there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr passed scheduling gated pods to groupPods with pods, they would be added to pod count and we wont inject fake pods for them
This should only subtract when podOwnerRef.Controller != nil && *podOwnerRef.Controller to be in sync with updatePodGroups below
Not really, I believe it would be redundant check because if the UID exists in the groups map then it refers to the same object that is pre checked to be controller, but will add as double check
But actually, instead of adding replicas and then subtracting them here, wouldn't it suffice to pass scheduling gated pods to groupPods? We're already passing a combined list of scheduled and unschedulable pods there.
We are not adding replicas and then subtracting them, we add twice to count and subtract from desired, The only reason I am doing that after grouping is to not do the listing and filtering to get gated pods in case we don't have any groups.
a281a40
to
341d253
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, just one last comment.
var podsToInject []*apiv1.Pod | ||
allPods, err := ctx.AllPodLister().List() | ||
if err != nil { | ||
klog.Errorf("Failed to list all pods from all pod lister: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant with the error returned - we'll end up logging the error twice. I'd just return the error and let the caller do the logging.
What type of PR is this?
/kind bug
What this PR does / why we need it:
Proactive scale up is listing the controllers, check the desired replicas and inject fake pods to proactively scale up even before these pods are created and marked as unschedulable.
Proactive scale up is using the following formula for the number of fake pods to inject per controller:
Number of fake pods = number of controller desired pods - (scheduled pods + unschedualable pods + unprocessed pods)
The problem here is that scheduling gated pods are not being excluded along with unschedualable and unprocessed so proactive scale up injects fake pods for these gated pods ignoring the condition. that happens each loop which leads to not scaling down that empty space.
This PR subtracts the number of pods with scheduling gates from the number of fake pods to inject.
Which issue(s) this PR fixes:
NONE
Special notes for your reviewer:
ctx.AllPodLister().List()
should be using a lister, so we are not having extra api call here to list the pods rather we just get it from cache.PodInjectionPodListProcessor
in case of any error filtering out scheduling gated pods, I choose to only log a warning and not return an error so it doesn't block the rest of the processors asCombinedPodListProcessor
stops the processing in case any one returns an error, which means any following processor (all others) will not be executedDoes this PR introduce a user-facing change?